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Abstract — We consider the problem of estimating a rank-one 
matrix in Gaussian noise under a probabilistic model for the 
left and right factors of the matrix. The probabilistic model can 
impose constraints on the factors including sparsity and positivity 
that arise commonly in learning problems. We propose a simple 
iterative procedure that reduces the problem to a sequence 
of scalar estimation computations. The method is similar to 
approximate message passing techniques based on Gaussian 
approximations of loopy belief propagation that have been used 
recently in compressed sensing. Leveraging analysis methods by 
Bayati and Montanari, we show that the asymptotic behavior of 
the estimates from the proposed iterative procedure is described 
by a simple scalar equivalent model, where the distribution 
of the estimates is identical to certain scalar estimates of the 
variables in Gaussian noise. Moreover, the effective Gaussian 
noise level is described by a set of state evolution equations. The 
proposed method thus provides a computationally simple and 
general method for rank-one estimation problems with a precise 
analysis in certain high-dimensional settings. 

Index Terms — approximate message passing, belief propaga- 
tion, compressed sensing, dictionary learning, low-rank matrices, 
matrix factorization. 



I. Introduction 

We consider the problem of estimating vectors Uo € 
and v G K" from a matrix A g M mx " f the form 



T 

u v 



mW, 



(1) 



where W represents some unknown noise and ^/m is a 
normalization factor. The problem can be considered as a rank- 
one special case of finding a low-rank matrix in the presence 
of noise. Such low-rank estimation problems arise in a range 
of applications including blind channel estimation [1 1, antenna 
array processing Q, subspace system identification [3|, and 
principal component or factor analysis (4). 

When the noise term W is zero, the vector pair (uq, Vo) can 
be recovered exactly up to a scaling, from the maximal left 
and right singular vectors of A (5). However, in the presence 
of noise, the rank-one matrix can in general only be estimated 
approximately. In this case, a priori information or constraints 
on (uo,Vq) may improve the estimation. Such constraints 
arise, for example, in factor analysis in statistics, where one 
of the factors is often constrained to be either positive or 
sparse [6|. Similar sparsity constraints arise in the problem 
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of dictionary learning 0. In digital communications, one of 
the factors could come from a discrete QAM constellation. 

Unfortunately, optimal estimation with constraints on Uo 
or Vo is often computationally intractable. The chief problem 
is the bilinear nature of the term Uov^. However, the term 
is linear in Uo and Vo separately. Thus, many suboptimal 
estimation methods are performed in an iterative manner, 
alternately estimating Uo and vo individually, while holding 
the estimate of the other factor constant. 

This paper proposes a variant of such alternating optimiza- 
tion procedures that we call Iterative Factorization (IterFac), 
stated in detail in Algorithm[T]below. Through proper selection 
of relevant parameters, the IterFac algorithm can perform 
alternating minimizations to optimizations of the form 



(S,v) 



arg mm 

u£l m ,v£l' 



F(u,v) 



(2) 



where the objective function is given by 

In. 



F(u,v) 
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+ C[/(u) + cv(v). (3) 



Here, ||X||i? is the Frobenius norm and C[/(u) and cy(v) 
are cost or regularization functions on the left and right 
factors. In the case when the cost functions are separable, the 
IterFac algorithm reduces the vector optimization problem to 
a sequence of scalar optimization problems on the individual 
components of u and v; it is thus computationally simple. The 
IterFac methodology is also genearl since the method can be 
applied to essentially arbitrary separable cost functions. 

Of course, such iterative algorithms are by no means new; 
they underlie many existing methods, including the classic 
alternating power method for finding maximal singular values 
[|5| and some alternating methods in sparse or non-negative 
dictionary learning ir7l- lfTTl . More recently, an alternating 
method has been proposed for estimation with matrix un- 
certainties lfl2l . also using an approximate message passing 
technique. The IterFac iterations also have some similarities 
to the updates in gradient descent methods in sparse dictionary 
learning such as in ||T3l . 

Our main contribution is to show that, under a particular 
setting of a damping parameter, the IterFac algorithm admits 
an asymptotically-exact characterization when W is i.i.d. 
Gaussian noise and the components of the true vectors Uo and 
vo have limiting empirical distributions. In this scenario, we 
show that the empirical joint distribution of the components of 
Uo and the corresponding estimates from the IterFac algorithm 
are described by a simple scalar equivalent model where 
the IterFac component estimates are identically distributed 
to scalar estimates of the variables corrupted by Gaussian 
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noise. Moreover, the effective Gaussian noise level in this 
model is described by a simple set of scalar state evolution 
(SE) equations. From the scalar equivalent model, one can 
compute the asymptotic value of almost any component- 
separable metric including mean-squared error or correlation. 
Thus, in addition to being computationally simple and general, 
the IterFac algorithm admits a precise analysis in the case of 
Gaussian noise. Moreover, since fixed points of the IterFac 
algorithm correspond, under suitable circumstances, to local 
minima of objectives such as (0, the analysis can be used to 
characterize the behavior of such minima — even if alternate 
algorithms to IterFac are used. 

The main analytical tool is a recently-developed technique 
by Bayati and Montanari 11141 used in the analysis of ap- 
proximate message passing (AMP) algorithms. AMP methods 
are Gaussian approximations of loopy belief propagation for 
estimation of vectors under large random linear measurements. 
The work [14] applied AMP techniques to compressed sens- 
ing, and proved that, in the limit for large Gaussian mixing 
matrices, the behavior of AMP estimates can be described by 
a scalar equivalent model with effective noise levels defined 
by certain scalar state evolution (SE) equations. Similar SE 
analyses have appeared in related contexts [ 15 1-|20|. To prove 
the SE equations for the IterFac algorithm, we apply a key 
theorem from |14| with a simple change of variables and a 
slight modification to account for parameter adaptation. A 
conference version of this paper appeared in ET1 . This paper 
provides all the proofs along with more detailed discussions 
and simulations. 

II. Iterative Rank-One Factorization 

For a matrix A £ K mx ", we consider the following iterative 
algorithm for estimating the rank-one factors Uo and Vo. 

Algorithm 1 Iterative Factorization (IterFac) 

Require: Matrix A £ R mxn and factor selection functions 

G„(t,p,A u ) and G v (t,q,X v ). 
1: t <- 

2: Select initial values u(0), v(0) 
3: repeat 

4: {Update estimate of u} 

5: Select parameters X u (t) and p u (t) 

6: p(i) <- (l/m)Av(t) + Hu(t)u(i) 

1: u(t+l)^G u (t,p(t),X u (t)) 



{Update estimate of v} 
Select parameters X v (t) and p v (t) 
q(t) <- (l/m)A T u(t + l) + /j, v (t)v(t) 
v(t+l) «-G„(t,q(t),A,,(t)) 
until Terminate 



The output of the algorithm, (u(t), v(t)), t = 0, 1, . . ., is a 
sequence of estimates for (uo, Vo). The algorithm has several 
parameters including the initial conditions, the parameters in 
lines [5] and [9] the termination condition and, most importantly, 
the functions G u (-) and G v (-). In each iteration, the functions 
G u (-) and G v ( ) are used to generate the estimates of the 



factors u(t) and v(t). G u (-) and G v (-) will thus be called the 
factor selection functions. 

To understand the role of the factor selection functions, 
suppose that we wish to perform the optimization (O for some 
regularization functions cjj(-) and cy(-). Consider the factor 
selections given by the minimization 



G u (t,p,X u ) 



arg mm 

uGR m 



T 

-p U 



C[/(u) 



G v (t,q, X v ) 



arg mm 

vgE" 



q v + cy(v) 



where the parameters A u (t) and A„ (t) are given by 

X u (t) := n u (t) + \\v(t)\\ 2 /m, 
X v (t) := fi v (t) + ||u(t+l)|| 2 /m. 



(4a) 



(4b) 



(5a) 
(5b) 



Lemma 1: Consider the outputs of Algorithm Q] with the 
factor selection functions (@), parameters X u (t) and X v (t) in 
(0. Then, for any initial conditions and parameters selections 

H u (t) and n v (t), 



u(t+l) = argmini< 1 (u, v(t)) 

u 



+ • 



|u-u(t)|| 



v(t+l) = argminF(u(i+l), v) 

V 

+ ll v - v WII ■ 



(6a) 



(6b) 



In addition, if \i u (t) and fi v (t) > 0, then the objective function 
is monotonically decreasing 



f(u(t+l),v(t + l))<f(u(t),v(t)). 



(7) 



Proof: See Appendix lAl ■ 
To understand the lemma, first suppose /i u (t) = /J, v (t) = 0. 
In this case, © show that the factors u(t+l) and v(t+l) are 
selected to minimize the objective function F(-) in the opti- 
mization (O. Thus, IterFac can be interpreted as minimizing 
the objective function, one factor at a time. Positive values of 
Hu(t) and fi v (t) add additional terms in the minimizations in 
(O that attempt to keep the new columns u(t+l) and v(t+l) 
close to their previous values, u(t) and v(t), and can thus 
be interpreted as a damping in the iterative minimization. We 
thus call fi u (t) and fj, v (t) the damping factors. Equation (O 
states that for any positive damping, the objective function 
monotonically decreases. Under mild additional assumptions, 
such as F(u, v) being bounded below, one can show that the 
estimates (u(t),v(t)) converge to a local minimum of the 
objective. 

A. Separable Functions 

Our interest in the IterFac algorithm will mostly be in the 
case when the cost functions C(j(u) and cy(v) are separable 
in that they are of the form 

m n 
c C/(u) =22c ui (ui), Cyr(v) = } J C vj (Vj), (8) 

i=l j=l 
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where c U i(ui) and c v j(vj) are cost functions on the individual 
components. In this case, the factor selection functions can be 
implemented by componentwise minimizations 



Gu{t, p, A u )i 
G v (t, q, A„)j 



arg min 
arg min 



-PiUi 



C-ui (w>i) 
Cvj{Vj) 



X u 2 
X v 2 



(9a) 



(9b) 



We will see that a similar separability property holds for 
MMSE estimation under the assumption of independent priors 
(see Section IPV-BT i. 

In the separable case, the IterFac algorithm reduces the 
vector-valued bilinear optimization to a sequence of scalar 
minimizations and matrix transforms. The complexity is low: 
Assuming the complexity of each scalar minimization does 
not grow with the dimension, the complexity of all the scalar 
minimizations together will be 0(m + n) per iteration. The 
complexity of the matrix multiplications will be 0(mn), so 
the total cost per iteration is 0(mn). In addition, we will see 
from the state evolution analysis in Section UlI-BI that the per- 
component performance as a function of the iteration number 
converges as n and m — > oo. Therefore, to maintain constant 
per-component performance, the number of iterations will not 
grow with dimension. Hence the total cost of the IterFac algo- 
rithm is 0(mn) — the same as a matrix multiplication. Thus, 
the IterFac algorithm offers the possibility, under separability 
assumptions on the cost functions, of computationally fast 
implementations for a large class of problems. 

III. Asymptotic Analysis under Gaussian Noise 

A. Model and Assumptions 

We analyze the algorithm under the following assumptions. 

Assumption 1: Consider a sequence of random realizations 
of the estimation problem in Section Q] indexed by the dimen- 
sion n. The matrix A and the parameters in Algorithm[T]satisfy 
the following: 

(a) For each n, the output dimension m = m(n) is deter- 
ministic and scales linearly with the input dimension in 
that 

lim n/m(n) = j3 (10) 

71— J-OO 

for some (3 > 0. 

(b) The matrix A has the form ([TJ where Uo € R m and 
vq € K™ represent "true" left and right factors of a rank 
one term, and W £ jj™x« j s an y ^ Q auss j an matrix 
with components Wij ~ Af(0,T w ) for some t w > 0. 

(c) The factor selection functions G u (t,p,X u ) and 
G v (t, q, A„) in lines [7] and [TT] are componentwise 
separable in that for all component indices i and j, 



G u {t, p, A u )j 
G v (t, q, X v )j 



G u (t,Pi,X u ), (11a) 
G v (t, qj ,X v ), (lib) 



for some scalar functions G u (t,p, X u ) and G v (t, q, X v ). 
The scalar functions must differentiable in p and q. 
Moreover, for every t, the functions G u (t,p, X u ) and 
dG u (t,p,X u )/dp must be Lipschitz continuous in p 
with a Lipschitz constant that is continuous in X u , and 



continuous in X u uniformly over p. Similarly, for every t, 
the functions G v (t,q, A&) and dG v (t,q,X v )/dq must be 
Lipschitz continuous in q with a Lipschitz constant that 
is continuous in A„, and continuous in X v uniformly over 

q- 

(d) The damping factors are selected by the rules 

71 ft 

Mu(*+1) = -T^y2—G v (t, qj (t),X v (t))a2a) 



fJL v (t) 



j= i 

m „ 

T ^ w Gu{t,pMMtmv 



with the initial damping factor fi u (0) = 0. 

(e) The parameters X u (t) and X v (t) are computed via 

1 ™ 

X u (t) = -}^x v (t,v j,Vj(t)) (13a) 
U j=i 

^ m 

K{t) = — y^Au(t,uoj,Ui(H-l)) (13b) 

i=l 

for (possibly vector-valued) functions 4>\ u ( ) and (j>\ v (-) 
that are pseudo-Lipschitz continuous of order p for some 
P > 2. 

(f) For each n and iteration number t, define the sets 

6 u {t) = {(u i0 ,Ui(t)),i = l,...,m}, (14a) 
v (t) = {(v jQ ,v 3 (t)),j = 1, ...,n}. (14b) 

There the sets for t — empirically converge with 
bounded moments of order p to the limits 



lim U (O) 
lim 0„(O) 



(U o ,U(0)), 
(Vo,V(0j), 



(15a) 
(15b) 



for some random variable pairs (Uo, U(0) and (Vq, V"(0)). 

See Appendix |B] for a precise definition of the empirical 

convergence used here. 
The assumptions need some explanations. Assumptions QJ a) 
and (b) simply state that we are considering an asymptotic 
analysis for certain large matrices A consisting of a ran- 
dom rank one matrix plus Gaussian noise. The analysis of 
Algorithm [T] for higher ranks is still not known, but we 
provide some possible ideas later. Assumption [He) is a mild 
condition on the factor selection functions. For example, as 
discussed above, the separability condition would occur if we 
use the factor selection functions (01 with element separable 
cost functions. Assumption (d) provides a precise proscription 
for the damping factors. Interestingly, these factors may be 
negative. 

Assumption (e) allows for the parameters X u (t) and X v (t) in 
the factor selection functions to be data dependent, provided 
that they can each be determined via empirical averages of 
some function of the most recent data. Note that the functions 
in (0 are of the form (13[ , and therefore the factor selection 
functions (01 would satisfy these assumptions. As discussed 
in Section [VT] it is possible that these parameters can also be 
used to estimate unknown parameters in the distributions of 
the components of Uq and vq, if they are not known. 
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To understand Assumption [Hf), note that the set 9 u (t) is the 
set of components of the "true" left factor Uo and the corre- 
sponding components in the estimate u(t) after t iterations of 
AlgorithmQ] Similarly, 9 v (t) represents the components of the 
"true" right factor Uo and its estimate v(t). Following Bayati 
and Montanari's analysis in |[T4l we will treat the variables as 
deterministic. However, the condition ( TT3T > requires that empir- 
ical distribution of the initial conditions asymptotically behave 
like some random vector pairs (Uo, U(0)) and (Vo, V(0)). 

B. Main Result 

Our main result, Theorem Q] below, will provide a charac- 
terization of the asymptotic empirical distribution for the sets 
9 u (t) and 9 v (t) in (TBI . Specifically, we will show that, for all 
t, the sets have empirical limits of the form 



lim 9 u {t) = (U Q ,U(t)), 



lim 9 v (t) 



(Vo,V(t)), 



(16a) 
(16b) 



for some random variable pairs (Uo, U(t) and (Vo, V(t)). The 
distributions of the random variables (Uo, U(i) and (Vo, V(t)) 
can be described recursively in t as follows: For t = 0, 
(Vo,V(0)) is the random variable pair in d!5bl ) in the initial 
condition Assumption |TJf). Then, for t > 0, (Uo, U(t + 1)) is 
given by 



U(t+1) = G u (t,P(t),\ u (t)), 
P(t) = pa vl (t)U +Z u (t), 
Z u (t) ~ N(Q,PT w a v o(t)), 



(17a) 
(17b) 
(17c) 

where Uo is the random variable in (1 1 5ab that describes the 
empirical limit of the components of the true vector uo; 
G u (-) is the scalar function in Assumption [Tfd); and Z u (t) is 
Gaussian noise independent of Uo- Thus, U(t+1) is distributed 
identically to the output of the scalar function G u (t, P(t)), 
where P(t) is a scaled and Gaussian noise-corrupted version of 
the true variable Uo- Similarly, for t > 0, the random variable 
(V , V(t + l)) in ( fT6bl is given by 

V(t + l) = G v (t,Q(t),\ v (t)), (18a) 
Q(t) = a ul (t + l)Vo + Z v (t), (18b) 
(t) ~ N(Q ,T w a u o(t + l)), (18c) 

where Vo is the empirical limit in (1 1 5bb that describes the limit 
of the true vectors vo, and Z v (t) is Gaussian noise independent 
of V . 

The constants a u o(t), a u \(t), a v o(t), a v i(t), X u (t), and 
X v (t) in dTTb and ( TT8l are deterministic sequences of scalars 
defined recursively by 

a u o(t) = E [U(tf] , a ul (t) = E[U U(t)} , (19a) 

a v0 (t)=E[V(t) 2 ], a vl (t)=E[V V(t)}, (19b) 

and 

X u (t) = E[4> Xu (Vo,V(t))], (20a) 

X v (t) = E[<f> Xv (U ,U(t+l))}, (20b) 

which are initialized with the values a v o(0) and a v i(0) from 
the joint distribution of (V ,V(0)) in the limit ( I15ab . The 



subsequent values of a u0 (t), a u \(t), a v0 (t), a v \(t), \ u (t) 
and X v (t) can then be computed recursively through (fT9b along 
with the definition of the variables in (fTTI i and i 181 , In line 
with [14] upon which our analysis is based, we will call the 
set of recursive equations (ITTb — (f20b the state evolution (SE) 
equations. From the solution to the SE equations, we get the 
exact descriptions of the limits in (fTo*b . 

Theorem 1: Under Assumption [TJ the sets 9 u (t) and 9 v (t) 
in dT4b converge empirically with bounded moments of order 
p with the limits in ([T6l >. 

Proof: See Appendix iDl ■ 

C. Scalar Equivalent Model 

The main contribution of Theorem Q] is that it provides 
a simple scalar equivalent model for the asymptotic behav- 
ior of the algorithm. The sets 9 u (t) = {(uoi, Ui(t))} and 
9v(t) — {(voj,Vj(t))} in (TBI are the components of true 
vectors uo and vo and their estimates u(i) and v(i). The 
theorem shows that empirical distribution of these components 
are asymptotically equivalent to simple random variable pairs 
(Uo,U(t)) and (V ,V(t)) given by (HT^i and (fl8al) . In this 
scalar system, the variable U(t+1) is the output of the factor 
selection function G u (t,-) applied to a scaled and Gaussian 
noise-corrupted version of the true variable Uo- Similarly, 
V(t+1) is the output of the factor selection function G v (t, •) 
applied to a scaled and Gaussian noise-corrupted version of 
the true variable Vq. Following ||22l . we can thus call the result 
a single-letter characterization of the algorithm. 

From this single-letter characterization, one can exactly 
compute a large class of performance metrics of the algorithm. 
Specifically, the empirical convergence of 9 u (t) shows that 
for any pseudo-Lipschitz function 4>(uo,u) of order p, the 
following limit exists almost surely: 



lim — 

n— >oo m 



E 



ct>(uio,Ui(t)) =E[<j>(Uo,U(t))\, (21) 



where the expectation on the right-hand side is over the vari- 
ables (Uo,U(t)) with Uo identical to the variable in the limit 
in d 1 5at and U(t) given by ( 117a| ). This expectation can thus be 
explicitly evaluated by a simple two-dimensional integral and 
consequently any component-separable performance metric 
based on a suitably continuous loss function <j)(uo,u) can 
be exactly computed. 

For example, if we take <fi(uo,u) = (u — uo) 2 we can 
compute the asymptotic mean squared error of the estimate, 

1 1 m 

lim — ||u(t) - u || 2 = lim — S^(ui(t) - u. l0 ) 2 

n—>oo xxi n—>oo xfl ^ — 4 

= E[(Uo-U(t)) 2 ]. (22) 
Also, for each t, define the empirical second-order statistics 



a u0 (t) = -\\u(t)\\ 2 , 
m 

a v o(t) = -||v(£)|| 2 , 



„i(t) = -u(i) T u , (23a) 
m 

1 



a vl (t) = -v(t) J v . (23b) 

n n 

Since ||u(i)|| 2 = J2i u i(t) 2 > ^ follows that a u0 (t) — > 
E(U(t) 2 ) almost surely as n —> oo. In this way, we obtain 
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that the following limits hold almost surely: 

lim a u0 (t) = a u0 (t), lim a u i{t) = a u i(i),(24a) 

n— too n— too 

lim a v0 (t) = a v0 (t), lim a vl (t) = a vl (t). (24b) 

n — ► oo n — ^oo 

We will also use definitions 

t u :=E[U§], t„ :— W*[Vq] . (25) 

From the second order statistics, we can compute the asymp- 
totic correlation coefficient between u and its estimate u 
given by 



Pu(t) ■-- 



= lim 



Hm |u(t)?u | 2 

n->oo ||u(t)|| 2 ||uo|| 2 

|(u(i) T u )/m| 2 



n^oo (||u(i)||2/m)(||uo|| 2 /m) 

[E(U(t)U Q )] 2 = c? ul (t) 
EU(t)mu$ a u0 (t)T u 



(26) 



Similarly, the asymptotic correlation coefficient between Vo 
and v has a simple expression 



p v (t) := lim 



|v(t) T v | 5 



a 2 vl (t) 



rwoo ||v(i)|| 2 ||v || 2 a v0 (t)T v ' 



(27) 



The correlation coefficient is useful, since we know that, 
without additional constraints, the terms Uo and vo can only 
be estimated up to a scalar. The correlation coefficient is scale 
invariant. 

IV. Examples 
A. Linear Selection Functions 

As a first simple application of the SE analysis, suppose we 
use linear selection functions of the form 



G u {t, p, A u ) — A u p, G v (t, q, A„) — A t ,q, 



(28) 



where the parameters A u and A^ allow for normalization or 
other scalings of the outputs. Linear selection functions of the 
form ( |28l > arise when one selects G u (-) and G v (-) from (0]i 
with zero cost functions, c;j(u) = cy(v) = 0. With zero 
cost functions, the correct solution to the optimization (O 
is for (u, v) to be the (appropriately scaled) left and right 
maximal singular vectors of A. We will thus call the estimates 
(u(i), v(t)) of Algorithm [T] and linear selection functions (|28l l 
the estimated maximal singular vectors. 

Theorem 2: Consider the state evolution equations ( fTTI i. 
(fT8l . and ( fT9] l with the linear selection functions (|28V Then: 

(a) The asymptotic correlation coefficients ( f26T > and (l27l i 
satisfy the following recursive rules: 



Pu(t+l) = 
Pv{t) = 



/3r u T v p v (t) 

pT u T v p v (t) + T u 

T u T v Pu(t) 
TuTv Pu{i) ~t~ T w 



(29a) 
(29b) 



(b) For any positive initial condition, p v (0) > 0, the asymp- 
totic correlation coefficients converge to the limits 



lim p u (t) = p* := ( 
t^oo t u t v (Pt u t v 



2 T 21 

'w] -|_ 



+ T w ) ■ 



lim p v (t) 

t— foo 



Pv 



(30a) 



(30b) 



where [x] + — max{0,x}. 

Proof: See Appendix [E] ■ 
The theorem provides a set of recursive equations for the 
asymptotic correlation coefficients p M (t) and p v (t) along with 
simple expressions for the limiting values as t —> 00. We thus 
obtain exactly how correlated the estimated maximal singular 
vectors of a matrix A of the form ([T]i are to the rank one factors 
(uo,Vq). The proof of the theorem also provides expressions 
for the second-order statistics in ( fT9] l to be used in the scalar 
equivalent model. 

The fixed point expressions (l30i > agree with the more general 
result in ll23l that derive the correlations for ranks greater than 
one and low -rank recovery with missing entries. An interesting 
consequence of the expressions in (f30b is that unless 



(31) 



the asymptotic correlation coefficients are exactly zero. The 
ratio t u t v /t w can be interpreted as a scaled SNR. 

B. Minimum Mean-Squared Error Estimation 

Next suppose that the priors on f/o and Vo are known. In 
this case, given (117al i and ( I18al ). a natural choice for the factor 
selection functions are 

G u (t,p)=E[U Q \P(t)}, G v (t,q)=E[V \Q(t)}, (32) 

which are the MMSE estimates of the variables. In this 
example, there are no parameters A„ or A,;. We can use the 
initial condition Vj(Q) = E[Vo], so that the initial variable in 
d!5bl > is V(0) = E[Vo]. To analyze the algorithms define 

E u (r) u ) := var([/ |y = ^U + D), (33a) 
£ v (r] v ) := var(Vo | Y = ^%V + D), (33b) 

where D ~ Af(0,l) is independent of Uo and Vo- That is, 
£u(Vu) and £ v (t] v ) are the mean-squared errors of estimating 
Uq and Vo from observations Y with SNRs of r\ u and r\ v . 
The functions £ u (-) and £ v (-) arise in a range of estimation 
problems and the analytic and functional properties of these 
functions can be found in l24l . ||251 . 

Theorem 3: Consider the solutions to the SE equations ( TP7I ). 
CEU, and dT9j under the MMSE selection functions (O and 
initial condition V(0) = E[V ]. Then: 

(a) For all t, the asymptotic correlation coefficients (l26b and 
d27l i satisfy the recursive relationships 

Pu {t+1) = 1 £u(0T v p v (t)/T w ), (34a) 

Pv(t) = 1 £v(T u pu(t)/r w ), (34b) 

with the initial condition p u (0) = (EVb) 2 /r„. 
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(b) If, in addition, £ u (f] u ) and £ v (rj v ) are continuous, then for 
any positive initial condition, p v (0) > 0, as t — > oo, the 
asymptotic correlation coefficients (p u (t), pviff) increase 
monotonically to fixed points (p*, p*) of d34l i with p* > 
0. 

Proof: See Appendix [F] ■ 
Again, we see that we can obtain simple, explicit recursions 
for the asymptotic correlations. Moreover, the asymptotic 
correlations provably converge to fixed points of the SE 
equations. The proof of the theorem also provides expressions 
for the second-order statistics in ( fT~9b to be used in the scalar 
equivalent model. 

C. Zero Initial Conditions 

The limiting condition in part (b) of Theorem[3]requires that 
p v {0) > 0, which occurs when E[Vo] ^ 0. Suppose, on the 
other hand, that E[U ] = E[V ] = 0. Then, the initial condition 
will be V(0) = E[V ] = 0. Under this initial condition, a 
simple set of calculations show that the SE equations (|34| > 
will generate a sequence with p v {t) = p u (t) = for all t. 
Thus, the IterFac algorithm will produce no useful estimates. 

Of course, with zero mean random variables, a more sen- 
sible initial condition is to take v(0) to be some non-zero 
random vector, as is commonly done in power algorithm 
recursions for computing maximal singular vectors. To under- 
stand the behavior of the algorithm under this random initial 
condition, let 

_ iv^vo! 2 .... 

Pv[ ' > - ||v || 2 ||v(i)|| 2 ' ( ' 

where we have explicitly denoted the dependence on the prob- 
lem dimension n. From S2T) . we have that lim.n_i.oo Pv(t, n) = 
p v (t) for all t. Also, with a random initial condition v(0) 
independent of Vo, it can be checked that p v (0, n) = 0(l/n) 
so that 

p v (0) = lim p v (0, n) = 0. 

71— >00 

Hence, from the SE equations d34b . p v (t) — p u (t) = for all 
t. That is, 

lim lim p(t, n) — 0. (36) 

t— >oo n— ¥OG 

This limit suggests that, even with random initial condition, 
the IterFac algorithm will not produce a useful estimate. 

However, it is still possible that the limit in the opposite 
order of ( f36b may be non-zero: 

lim lim p(t, n) > 0. (37) 

n— >oo t— foo 

That is, for each n, it may be possible to obtain a non- 
zero correlation, but the number of iterations for convergence 
increases with n since the algorithm starts from a decreasingly 
small initial correlation. Unfortunately, our SE analysis cannot 
make predictions on limits in the order of (1371 1 . 
We can however analyze the following limit: 
Lemma 2: Consider the MMSE SE equations ([34-b with 
random variables C/ an d sucn that E[Vo] = E[[/ ] = 0. 
For each e > 0, let p%(t) be the solution to the SE equations 
with an initial condition p v (0) = e. Then, 



(a) If t//3t u T v > T w , 

lim lim p e v (t) > 0. (38) 

e— >0 t— foo 

(b) Conversely, if \/]3t u t v < t w , 

lim lim p e Jt) = 0. (39) 

e— >0 t— too 

Proof: See Appendix iGl ■ 
The result of the lemma is somewhat disappointing. The 
lemma shows that \J^t u t v > t w is essentially necessary and 
sufficient for the IterFac algorithm with MMSE estimates to 
be able to overcome arbitrarily small initial conditions and 
obtain an estimate with a non-zero correlation to the true 
vector. Unfortunately, this is the identical to the condition 
PTT l for the linear estimator to obtain a non-zero correlation. 
Thus, the IterFac algorithm with MMSE estimates performs 
no better than simple linear estimation in the initial iterations 
when the priors have zero means. Since linear estimation is 
equivalent to finding maximal singular vectors without any 
particular constraints, we could interpret Lemma [2] as saying 
that the IterFac algorithm under MMSE estimation cannot 
exploit structure in the components in the initial iterations. 
As a result, in low SNRs it may be necessary to use other 
algorithms as an initial condition for IterFac - such procedures, 
however, require further study. 

V. Numerical Simulation 

To validate the SE analysis, we consider a simple case where 
the left factor Uo G R" 1 is i.i.d. Gaussian, zero mean and 
vo € K n has Bernoulli-Exponential components: 

J with prob 1 — A, 

V ° J ~ \ Exp(l) with prob A, ( U) 

which provides a simple model for a sparse, positive vector. 
The parameter A is the fraction of nonzero components and is 
set in this simulation to A = 0.1. Note that these components 
have a non-zero mean so the difficulties of Section IIV-CI are 
avoided. The dimensions are (m,n) = (1000,500), and the 
noise level t w is set according to the scaled SNR defined as 

SNR= 10log w (T u T v /T w ). (41) 

Estimating the vector Vo in this set-up is related to finding 
sparse principal vectors of the matrix A T A for which there 
are large number of excellent methods including [6|, [26]-|30| 
to name a few. These algorithms include methods based on 
thresholding, i!i-regularization and semidefinite programming. 
A comparison of the IterFac against these methods would be 
an interesting avenue of future research. Here, we simply wish 
to verify the SE predictions of the IterFac method. 

The results of the simulation are shown in Fig. Q] which 
shows the simulated and SE-predicted performance of the 
IterFac algorithm with both the linear and MMSE selection 
functions for the priors on u and v. The algorithm is run 
for t = 10 iterations and the plot shows the median of the 
final correlation coefficient p v (t) over 50 Monte Carlo trials at 
each value of SNR. It can be seen that the performance of the 
IterFac algorithm for both the linear and MMSE estimates are 
in excellent agreement with the SE predictions. The correlation 
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Scaled SNR (dB) 



Fig. 1. Simulation of the IterFac algorithm for both the linear selection 
functions in Section IIV-AI (labeled iter-lin) and MMSE selection functions 
in Section IIV-BI (labeled iter-mmse). Plotted are the correlation values after 
10 iterations. The simulated values are compared against the SE predictions. 
Also shown is the simple estimate from the maximal singular vectors of A. 
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Fig. 2. Per iteration performance of the IterFac algorithm for both the linear 
selection functions in Section ITV-AI (IF-lin) and MMSE selection functions in 
Section HV-BI (IF-mmse). The simulated values are compared against the SE 
predictions. 



coefficient of the linear estimator also matches the correlation 
of the estimates produced from the maximal singular vectors of 
A. This is not surprising since, with linear selection functions, 
the IterFac algorithm is essentially an iterative method to 
find the maximal singular vectors. The figure also shows the 
benefit of exploiting the prior on Vn, which is evident from the 
superior performance of the MMSE estimate over the linear 
reconstruction. 

Fig. |2] shows the correlation coefficient as a function of the 
iteration number for the MMSE and linear methods for two 
values of the SNR. Again, we see that the SE predictions are 



closely matched to the median simulated values. In addition, 
we see that we get good convergence within 4 to 8 iterations. 
Based on the SE predictions, this number will not scale with 
dimension and hence the simulation suggests that only a small 
number of iterations will be necessary for even very large 
problems. All code for the simulation can be found in the 
public GAMP sourceforge project PP . 

VI. Limitations and Extensions 

While our initial analysis of the IterFac algorithm is encour- 
aging, there are several issues that can be explored for future 
work: 

a) Extensions to higher rank: The current algorithm only 
works on rank one matrices, while many problems require 
estimation of low-rank matrices. One simple way to extend the 
current method to higher ranks is to perform the estimation 
one column at a time for each matrix factor. In each column 
update, the matrix A would be replaced by the residual from 
subtracting the current values of the other columns. Lemma [T] 
can be easily generalized to show monotonicity in this case. 
However, the state evolution analysis will be more difficult 
to generalize since it would require that the residual error be 
approximately Gaussian. 

b) Unknown priors: The MMSE estimator in Section 
IIV-BI requires exact knowledge of the priors on Uo and Vb 
as well as the Gaussian noise level t w . In many problems 
in statistics, these are not known. There are two possible 
solutions that may be investigated in the future. One method 
is to parameterize the distributions of Uq and Vq and estimate 
these parameters in the MMSE selection functions d32l - 
potentially through an EM type procedure as in (32]. This EM 
type approach with hidden hyperparameters has been recently 
successfully used in a related approximate message passing 
method in ll33ll . The analysis of the such parameter learning 
could possibly be accounted for through the adaptation pa- 
rameters X u (t) and X v (t). A second approach is to assume 
that the distributions of Uq and Vq belong to a known family 
of distributions and then find a min-max solution. Such a 
min-max technique was proposed for AMP recovery of sparse 
vectors in [34|. See also |35|. 

c) Optimality: While the current paper characterizes the 
performance of the IterFac algorithm, it remains open how 
far that performance is to optimal estimation such as the 
joint MMSE estimates of Uo and Vn. We conjecture that the 
asymptotic correlation of the optimal estimates of Uo and 
Vn may also be fixed points to (l34i l. If this conjecture were 
true than the uniqueness of the fixed points would provide a 
testable condition for the optimality of IterFac. The reason to 
suspect this optimality property is by comparison to several 
works that have derived similar scalar equivalent models 
and state evolution equations for regression problems with 
large random measurement matrices. In these works, belief 
propagation analysis of large, random sparse matrices lfT31l . 
lfT71l - |[T9l . as well as replica symmetric analysis of dense i.i.d. 
matrices |36|-|38| suggest that the performance of optimal 
estimators can be described by the same fixed points as the 
limiting points of approximate message passing (AMP). Thus, 



AMP is potentially optimal in these scenarios when the fixed 
points are unique. 

Conclusions and Future Work 

We have presented a computationally-efficient method for 
estimating rank-one matrices in noise. The estimation problem 
is reduced to a sequence of scalar AWGN estimation problems 
which can be performed easily for a large class of priors 
or regularization functions on the coefficients. In the case of 
Gaussian noise, the asymptotic performance of the algorithm 
is exactly characterized by a set of scalar state evolution 
equations which appear to match the performance at moderate 
dimensions well. Thus, the methodology is computationally 
simple, general and admits a precise analysis in certain asymp- 
totic, random settings. Future work include extensions to 
higher rank matrices and handling of the cases where the priors 
are not known. 

Appendix A 
Proof of LemmaQ] 

To prove d6ai >. first observe that 
F(u,v(t)) + ^#||u-u(t)|| 2 



(a) 1 
= II A — 

2m " 

+M*)IMI 2 



2m" w " F 



C[r(u) 
/i u (t)u(t) T u + const 



(6) 1 



-(||v(t)|| 2 /m + M 40)l|u|| 2 + CC /(u) 
- ((l/m)Av(i) + [i u (t)\i(t)) T u + const 



(c) A„(t), 



u|j 2 - p(i) T u + C[/(u) + const (42) 



where in all the equations, the factor "const" refers to a term 
not dependent on u and (a) follows from the definition of the 
objective function in (01; (b) follows from expanding the terms 
in the || A — uv(i) T |||,; and (c) follows from the expression for 
X u (t) in ( f5ab and p(t) in line © of Algorithm!]] The definition 
of G u (t, p) in (Plat now proves d6al l. The minimization d6bl is 
proved similarly. 

To prove (0, we use a standard trick (see, for example the 
use of the auxiliary function in |9|): 

F(u(t+l),v(*)) 

< F(u(m), v(t)) + ^||u(t+l) - u(t)|| 2 

< F(u(t),v(t)) + ^||uW-u(t)|| 2 



= F(u(t),v(t)) 



(43) 



where (a) follows from the assumption that the damping factor 
Hu(t) is non-negative and (b) follows since u(t + l) is the 
solution to the minimization (f6ab . Similarly, one can show 
that, since fJ, v (t) > 0, 

F(u(i+l),v(t+l)) < F(u(t+l),v(t)). (44) 

Together d43j and 04) prove (7). 



Appendix B 

Empirical Convergence of Random Variables 



Bayati and Montanari's analysis in [ 14 1 employs certain de- 
terministic models on the vectors and then proves convergence 
properties of related empirical distributions. To apply the same 
analysis here, we need to review some of their definitions. We 
say a function <j> : W — >• M. s is pseudo-Lipschitz of order 
p > 1, if there exists an L > such for any x, y G K r , 



||0(x)-0(y)|| <L(l + ||x| 



p-i 



r^llx-yll 



Now suppose that for each n — 1,2...., we have a set of 
vectors 

e(n) = {v i (n),i = l ) ...,£(n)} ) 

where the elements are vectors Vj(n) G M. s , and the size of 
the set is given by £(n). We say that the set of components of 
0(n) empirically converges with bounded moments of order p 
as n — > oo to a random vector V on R s if: For all pseudo- 
Lipschitz continuous functions, (j>, of order p, 

£(n) 

lim 7T~\ E ^iin)) = E[0(V)] < oo. (45) 
n-Kx> tin) A — ' 
v ' i—1 

When the nature of convergence is clear, we may write (with 
some abuse of notation) 



lim 



(n) = V. 



Appendix C 

Bayati-Montanari Recursions with Adaptation 

Our main result will need an adaptive version of the 
recursion theorem of Bayati and Montanari fl4l . Let 
H u (t,d,UQ,v u ) and H v (t,b,vo,u v ) be two functions defined 
on arguments t = 0, 1, 2, . . . and d, b, uo and vq G K as well 
as vectors v u and v v . Given a matrix S G R mx ™ and vectors 
u and vo, generate a sequence of vectors b(t) and d(t) by 
the iterations 

b(i) = Sv(t)+e„(t)u(i), (46a) 
d(t) - S T u(t + l) +6,(t)v(t) (46b) 

where 

u<(t+l) = H u (t,bi(t),Uoi,Vu(t)), (47a) 
Wj(t+1) - Hufadjty^oj^vit)), (47b) 

and ^„(<) and are scalar step sizes given by 

1 m 3 

= -—5Z^#t.(«,&i(t),««>, (48a) 
1 n 9 

= -— 5Z^7-ff»(*,di(*)."io,M»(*)). (48b) 

The recursions (06) to 08) are identical to the recursions 
analyzed in lfl4l . except for the introduction of the parameters 
v u (t) and v v (t). We will call these parameters adaptation 
parameters since they enable the functions H u (-) and H v (-) to 
have some data dependence, not explicitly considered in [14J. 
Similar to the selection of the parameters A u (t) and X v (t) 



9 



in < fl~3T >. we assume that, in each iteration t, the adaptation 
parameters are selected by a function of the form, 

1 - 

^(*) = ~2Z0«(*.«oi. (49a) 

n i=i 

v v (t) = — 53^(t,«oi,Ui(t+l)) (49b) 

i=l 

for (possibly vector-valued) functions </>„(•) and (/>„(•) that are 
pseudo-Lipschitz continuous of order p for some p > 1. Thus, 
the values of f u (i) and ^(i) depend on the outputs v(t) 
and u(i+l). Note that in equations d47| i to j49l , dj, u j> 
and voj are the components of the vectors d, Uo, b and Vo, 
respectively. The algorithm is initialized with t = 0, (0) = 
and some vector v(0). 

Now, similar to Section [TTTJ consider a sequence of random 
realizations of the parameters indexed by the input dimension 
n. For each n, we assume that the output dimension m = 
m(n) is deterministic and scales linearly as in ( fTOb for some 
j3 > 0. Assume that the transform matrix S has i.i.d. Gaussian 
components Sy ~ A/"(0, 1/rn). Also assume that the empirical 
limits in (fT~5T > hold with bounded moments of order 2p — 2 for 
some limiting random variables Uo and (Vb,V^(0)). We will 
also assume the following continuity assumptions on H u (-) 
and H v (-)\ 

Assumption 2: The function H u (t,b,UQ,v v ) satisfies the 
following continuity conditions: 

(a) For every v u and t, H u (t,b,uo,v u ) and its derivative 
dH u (t,b,uo,v v ) j db are Lipschitz continuous in b and 
uo for some Lipschitz constant that is continuous in v u \ 
and 

(b) For every v u and t, H u (t, b, uq, v u ) and 
8H u (t,b,uo,v v ) / db are is continuous at v u uniformly 
over (b, uq). 

The function H v (t, d, vo,v v ) satisfies the analogous continuity 
assumptions in d, vo and v v . 

Under these assumption, we will show, as in Section [ill] 
that for any fixed iteration t, the sets 6 u (t) and 9 v (t) in 
(fl4l converge empirically to the limits (fTol l for some random 
variable pairs (Uq, U(t)) and (Vo, V(t)). The random variable 
Uo is identical to the variable in the limit ( 1 1 5ab and, for t > 0, 
U(t) is given by 

U(t+l) = H u (t,B(t),U ,V u (t)), (50) 
B(t)~AA(0,r fc (0), (51) 

for some deterministic constants V v (t) and r b (t) that will be 
defined below. Similarly, the random variable Vo is identical 
to the variable in the limit ( I15bl ) and, for t > 0, is given 
by 

V(t+l) = H u (t, D(t), Vo,V v (t)), 

D(t)~N(i),T d (t)), (52) 

for some constants T> v (t + l) and r d (t). 

The constants t & (£), r d (t), V u (t) and T7 w (i) can be com- 
puted recursively through the following state evolution equa- 



tions 

r d (t) = E[U(t + l) 2 ], (53a) 

r b (t) = /3E[V(t) 2 ] (53b) 

V u (t) = E[cf> u (t,Vo,V(t))} (53c) 

V v (t) - E[Mt,Uo,U(t+l))} (53d) 

where the expectations are over the random variables U (t) and 
V(t) above and initialized with 

r b (0) := ,SE [V(0) 2 ] . (54) 

With these definitions, we can now state the adaptive version 
of the result from Bayati and Montanari |[T4l 

Theorem 4: Consider the recursion in d4oT ) to d49l . Then, 
under the above assumptions, for any fixed iteration number 
t, the sets 9 u (t) and v (t) in ( TT4T > converge empirically to the 
limits ( TToT ) with bounded moments of order p to the random 
variable pairs (Uo,U(t)) and (Vo,V(t)) described above. 

Proof: For each size n, we use an asterisk super- 
script to denote the outputs of a non-adaptive version of 
the recursions d46l ) to i49\ . That is, quantities such as 
u*(t),v*(t),p*(t),q*(t), . . ., will represent the sequence of 
outputs generated by recursions (|46| | to d49l with the same 
initial conditions (v*(0) = v(0) and £ u (0) = £*(0) = 0), 
but where v u (t) and v v (t) are replaced by their deterministic 
limits V u (t) and V v (t) in d47j and (|48V That is, 

uj(t+l) = ff u (t,6*(t),<,z7„(t)), (55a) 
w|(t+l) = ^(t^W.^.^^+l)), (55b) 

and 

1 m 3 

£W = --E^^^^W'<o^«W) (56a) 

2 — 1 1 
1 n ^ 

G(t+i) = — ff„(t,d;(*),u; ,i7„(i+i)).(56b) 

3 = 1 J 

Now, Bayati and Montanari's result in l[T4ll shows that this 
non-adaptive algorithm satisfies the required properties. That 
is, the following limits hold with bounded moments of order 

P, 

lim {(uoi,u*(t)), i = 1,. . . , m} = (U , U(t)) (57a) 

71— >00 

\im{(v 0l ,v*(t)),j = l,...,n} = (Vb,V(«)).(57b) 

n—>oc J 

So, the limits ( TToT l will be shown if we can prove the following 
limits hold almost surely for all t: 

lim — ||u(i) -u*(t)||5 = (58a) 

n— >oo m 

lim -||v(t)-v*(t)K = 0, (58b) 

n— >oo fi 

where || ■ || p is the p-norm. In addition to proving (l58k we 
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will show the following limits hold almost surely, 



1 

lim - bt -b* i 

n— >oo ffi 


= 




(59a) 


lim ±||d(i)-d*(t)||* 

n— >oo n 1 


= 




(59b) 


lim \Ut)-C(t)\ 

n— >oo 


= 




(59c) 


lim |&(f)-£(*)| 

n— >oo 


= 




(59d) 


lim u u (t) 


= v u 


(t) 


(59e) 


lim u v (t) 


= v v 


(*) 


(59f) 



The proof of the limits (1581 and (|59l can be demonstrated via 
induction with the following straightforward (but somewhat 
tedious) continuity argument: 

To begin the induction argument, first note that the non- 
adaptive algorithm has the same initial condition as the adap- 
tive algorithm. That is, v*(0) = v(0) and £ u (0) = £*(0) = 0. 
Also, since £ u (0) = ££(0) = 0, from d46ab . the initial value 
of u(t) does not matter. So, without loss of generality, we can 
assume that the initial condition satisfies u(0) = u*(0). Thus, 
the hmits d58at , d58bl and d59cl hold for t = 0. 

We now proceed by induction. Suppose that the limits (I58ab . 
(I58bb and (I59cb hold almost surely for some t > 0. Since S 
has i.i.d. components with zero mean and variance 1/m, a 
standard result from random matrix theory (e.g. ||39l ) shows 
that its p-norm operator norm is bounded. That is, there exists 
a constant Cs > such that 



limsup ||S|| P < Cs, limsup||S || p < Cs- 

n— >oo n— >oo 

Substituting the bound (f60l > into ( 146ab . we obtain 



(60) 



l|b(t)-b*(t)|| p <||S|| p ||v(t)-v*(t)|| p 

+\Ut)\Mt) - u*(t)|| p + \Ut) - C(t)\\\u*(t)\\ P 
< c s \\v(t)-v*(t)\\ p + \Ut)\\Mt)-u*(t)\\p 
+\Ut)-£(t)\\W*(t)\\p- (6i) 

Now, since p > 1, we have that for any positive numbers a 
and b, 

(a + bf <2 p -\a p + b p ). (62) 



Using the induction hypotheses, (158al >. d58bb and J59cb along 
with and the inequalities (l62l and (l6Tb shows ( |59ab . 

Next, to prove the limit (|59el i, note that since we have 
assumed that </>„(■) is pseudo-Lipschitz continuous of order 
p, there exists a constant L v > such that 

<j> u (t,V0j,Vj(t)) - <t> u (t,V0j,Vj(t)) 



< L v \\vj(t) 
Therefore, from ( 149 at , 

i^w-<wi<- ,r 



«*(*)l p 



|v(«) 



< 



n 



|v(t)-v*(i)||? 



v*(t)||? + 
^(-||v(t) 



|v(t)-v 
-v*(t)||J 



Will 
1/p 



where the last step used Jensen's inequality. Therefore, apply- 
ing the limit (I58bb shows that d59eb holds almost surely. 

Now the limits ( 158bl i and (|59eb together with the continuity 
conditions on H u (-) in Assumption |2] show that ( 15 8 at holds 



almost surely for t + 1 and d59db holds almost surely for 
Using d46bb , the proof of the limit d59bb is similar to the 
proof of (|59ab . These limits in turn show that the limits (|58bb 
and d59cb hold almost surely for t + 1. We have thus shown 
that if d58ab . ( 158bl ) and d59cb hold almost surely for some 
t, they hold for t + 1. Thus, by induction they hold for all 
t. Finally, applying the limits (|57| |. ( 15 8 at and (I58bb and a 
continuity argument shows that the desired limits d!6ab hold 
almost surely. 



Appendix D 
Proof of TheoremQ] 

The theorem directly follows from the adaptive Bayati- 
Montanari recursion theorem, Theorem [4] above, with some 
change of variables. Specifically, let 

S = ^^W, (63) 

where W is the Gaussian noise in the rank one model 
in Assumption |TJb). Since W has i.i.d. components with 
Gaussian distributions JV(0,t w ), the components of S will 
be i.i.d. with distributions Af(0, 1/m). 

Now, using the rank one model for A in (Q]) 



Av(t) = uov^v(t) + y/mWv(t) 
= na v i(t)u + y/mWv(t), 



(64) 

where the last step is from the definition of a v i(t) in ( f23l . 
Substituting ( |64l i into the the update rule for p(t) in line [6] of 
Algorithm [T] we obtain 

p(t) = (l/m)A(i)v(t) +(i u (t)u(t) 

= (l/VS)Wv(i) + KiWuo+^(t)u(t). (65) 

Note that we have used the fact that f3 = n/m. Hence, if we 
define 

b(t) = -=(p(«) - Pa vl {t)u ), (66) 



(67) 
(68) 

(69) 

(70) 
(71) 

(72a) 
(72b) 



then (J63J and ((65) show that 

b(t) = S(t)v(t)+^ u (t)u(t), 

where 

£ u (t) = Hu{t)/y/r^. 

Similarly, one can show that if we define 
1 



d(t) 



: (q(f) - a„i(t)v ), 



then 



where 



d(i) = S(t) T u(t + l)+^(i)v(t), 



Next define the adaptation functions 

<p u (t,v ,v) := (ot ,'/ , a«(^wo,w)) 



(uu ,4>xv (t, u ,u)) 
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which are the adaptation functions in ( TO} with additional 
components for the second-order statistics uuq and vvq. Since 
4>\ u (t, ■) and 4>\ v (t, •) are pseudo-Lipschitz of order p, so are 
<p u (t, •) and <fi v (t, ■). Taking the empirical means over each of 
the two components of 4> u (') and (f> v (-), and applying ( fT3T > and 
(1231 1. we see that if z/ u (t) and v v (t) are defined as in (1491 . 



1 " 

= — 5^&>(*) v oj,Vj{t)) = (avi(t), A„(i))(73a) 



i=i 



1 



m * — ' 

i=l 

(a„i(t+l),A^(t)) 



(73b) 



Therefore, z/„(t) and f„(t) are vectors containing the param- 
eters X u (t) and At,(i) for the factor selection functions in 
lines [7] and QT| of Algorithm Q] as well as the second-order 
statistics a u i(t) and a v \{t). Now, for ^ u = (a v x,X u ) and 
^ti = (diuijA^) define the scalar functions 

H u (t,b,Uo,Vu) ■= G u (t, y/r^b + f3a vl u (h \ u ) (74a) 
H v (t,d,VQ,v v ) := G„(t,y^;ci-|-a„iUo,A w ). (74b) 

Since G u (t,p,X u ) and G v (t,q, X v ) satisfy the continu- 
ity conditions in Assumption Q2 C X H u (t, b, uq, X u ) and 
H v (t,d,VQ, X v ) satisfy Assumption In addition, the com- 
ponentwise separability assumption in (fTTT i implies that the 
updates in lines 171 and [TTI of Algorithm [TJ can be rewritten as 

«i(t+l) = G u (t,pi(t),A„(t)) (75a) 

^(t+1) = G„(i,dj(*),<M*))- (75b) 

Thus, combining d74l > and (fTBT l with the definitions of b(t) 
and d(t) in (f66t and (|69l l, we obtain 



i(t+l) = flu(t,6((t), ««,!/«(*)) 

/(*+!) = H v (t,dj(t),v j,v v (t)). 



(76a) 
(76b) 



Next observe that 



&(*+!) 



(a) 



jU (*+l)/\^ 



\ " 



z -G v {t, qj {t),X v {t)) 
1 n 9 

--V-ff^M.fi),^^)) (77) 



3=1 



where (a) follows from the definition of £ u (t) in (f68b ; (b) is 
the setting for /i u (t + l) in (I12ab : and (c) follows from the 



definition of H v (t,d) in (174bl >. Similarly, one can show that 

1 m d 

Ut) = /2 /wT^fr 6 ^)' "«(*))■ (78) 

i—l 

Equations ([67]), ([70l>, {73]), d74l . ([77) and (|78]l exactly match 
the recursions in equations d46b to ( f49b . Therefore, Theorem H] 
shows that the limits < fT6l > hold in a sense that the sets 9 U (i) and 
9 v (t) converge empirically with bounded moments of order p. 

We next show that the limits U (i) and V(t) on the right- 
hand side of ( [TBI match the descriptions in (1 1 Vat and d 1 8ab . 
First, define a u i(t) and a v ±(t) in ( fT9b and A u (t) and X v (t) 



as in ( f20b . Then, from d72l ), the expectations !/ u (i) and 
in d53cb and ( I53dl > are given by 

= (a„i(t), A„(t)), F t ,(t) = (S„i(t+l),A^(t)). (79) 

Using ([5TJ, < 174at and (|79j, we see that 

I7(t+1) = H u (t,B(t),U ,V u (t)) 

= G u (t, /3a vl (t) Uq + ^B(t) , X u (t)) , (80) 

where B(t) ~ Af(0,T b (t)). Therefore, if we let Z u (t) = 
y/T^B{t), then Z u (t) is zero mean Gaussian with variance 



E [Z 2 u (t)} = T w r\t) ( = 3 /3t w E[V(*) 2 ] l = j ^a^t), 



(M 



where follows from ( |53bt and (b) follows from the definition 
of a„o(i) in CSJl. Substituting Z u [t) = <Jr^B(t) into ([80l) we 
obtain the model for U(t+1) in d!7ab . Similarly, using d52l > and 
d74bb . we can obtain the model V(t+1) in (1 1 8ab . Thus, we have 
proven that the random variables (Uq, U (t)) and (Vo, V'(i)) are 
described by dl7ab and (|18ab , and this completes the proof. 

Appendix E 
Proof of Theorem[2] 

The theorem is proven by simply evaluating the second 
order statistics. We begin with a u \(t+l): 

a ul (t + l) ( => E[U U{t+l)} ( =' X u (t)E[U P(t)} 



X u (t)E [U (pa vl (t)U + Z u (t))} 
X u (t)pT u a v i(t) 



(81) 



where (a) is the definition in ( Tf9l ; (b) follows from ( 1 17ab 
and (f28b ; (c) follows from ( 117bl ); and (d) follows from the 
independence of Z u (t) and Uo and the definition of r u in 
Similarly, one can show that 



u o(t+l) = A„(i) 2 [/^^(i) + /3r u) a„ (i)] 



(82) 



Substituting ( T8~Tb and (T8~2b into d26l i. we obtain the asymptotic 
correlation coefficient 



P«(t+1) 



AJ^r^i) 



A„(i) [/StuQ!^^) + /3r w a^ (*)] t« 
PTua 2 vl {t) 



a„o(*) 

Pr u T v p v (t) 

where the last step follows from (|27| |. This proves ( I29ab . 
A similar set of calculations shows that 

a vl {t+l) = X v (t)T v a ul (t + l) (83a) 
a v0 {t+l) = X v (t) 2 [T v a 2 ul (t+1) +T w a u0 (t)] .(83b) 

Applying these equations into d2"6*i l and ( |27] |. we obtain the 
recursion (|29bl i. Hence, we have proven part (a) of the 
theorem. 

For part (b), we need the following simple lemma. 
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Lemma 3: Suppose that H : [0, 1] — > [0, 1] is continuous 
and monotonically increasing, and x(t) is a sequence satisfy- 
ing the recursive relation 



x(t + l) = H(x(t)), 



(84) 



for some initial condition x(0) € [0, 1]. Then, either x(t) 
monotonically increases or decreases to some x* = H(x*). 
Proof: This can be proven similar to [19, lemma 7]. ■ 
To apply Lemma [5] observe that the recursions d29l show 
that 



Pu(*+1) 



/3T u T v p v (t) 

Pr u T v p v (t) + 



P T u T vPu(t) + T w (T u T v p u (t) + T w ) ' 



So, if we define 

H(p. 



P>tItIp u 



(85) 



W T u T v + T w T u T v )p u + 7-2 ' 

then it follows from (|29j that p u (t + l) = H(p u (t)) for all 
t. By taking the derivative, it can be checked that H(p u ) 
is monotonically increasing. It follows from Lemma [3] that 
Pu(t) -> P* u for some fixed point p* u = H(p* u ) with p* € [0, 1]. 

Now, there are only two fixed point solutions to p* u = 
ff«): P : = 0and 



When 



P'u'v — W 



(86) 



(87) 



then /?* in ( l86l l is not positive, so the only fixed solution in 
[0,1] is p* u — 0. Therefore, when (f87b is not satisfied, /9„(i) 
must converge to the zero fixed point: p u (t) —> 0. 

Now, suppose that (|87T i is satisfied. In this case, we claim 
that p u (t) —> p* u where p* u is in (186b . We prove this claim 
by contradiction and suppose, instead, that converges to 
the other fixed point: p u (t) — > 0. Since Lemma [3] shows that 
p u {t) must be either monotonically increasing or decreasing, 
the only way p u (i) — > is that p u (t) monotonically decreases 
to zero. But, when d87i > is satisfied, it can be checked that for 
p u (t) sufficiently small and positive, p u (t + l) > H(p u (t)). 
This contradicts the fact that p u (t) is monotonically decreas- 
ing, and therefore, p u {t) must converge to the other fixed point 
p* u in iUl. 

Hence, we have shown that when d87l > is not satisfied, 
Pu(t) — > 0, and when (T8~7b is satisfied p u (t) — > p* u in d86l >. 
This is equivalent to the limit in (130ab . The limit ( I30bb is 
proved similarly. 

Appendix F 
Proof of Theorem[3] 

Similar to the proof of Theorem [2] we begin by comput- 
ing the second-order statistics of (Uo,U(t)). Since U(t) = 
E[Uq I P(t — 1)], U(t) must be uncorrelated with the error: 
U{t){U Q - U(t)) = 0. Hence, 

a u0 (t)-a ul {t) = E[U{t)U - U(t)U(t)} = 0, (88) 



and therefore a u o(t) = a u i(t). Now, consider the measure- 
ment P(t) in d!7bb . The SNR in this channel is 



Vu(t) 



P 2 a 2 vl (t) Pr vPv {t) 



/3T w a v0 (t) 



(89) 



Since U(t+1) is the conditional expectation of Uq given P(t), 
the mean-squared error is given by £ u {r]u{t)) defined in (133al >. 
Therefore, 

£ u (r ]u (t))=E[U(t+l)-Uo] 2 

(a) 



(b) 



a u0 (t+l) - 25 u i(i+l) + t u 
t u - a u o(t+T), 



(90) 



where (a) follows from expanding the square and substituting 
in the definitions in ([T9i l and (l25l i: and (b) follows from the 
fact that a u0 {t+l) — a ul (t+l) proven above. We have thus 
proven that 

a u0 (t+ 1) = am (t + 1) = t u - E u (r) u {t)). (91) 
Therefore, the asymptotic correlation coefficient is given by 

(a) a 2 ul {t + l) 



Pu{t+l) 



l-r-'SMt)), (92) 



(6) 



where (a) follows from ( T26l i and (b) follows from d9Tb . 
Substituting in d89b into d92l proves d34al >. The recursion d34bb 
can be proven similarly. 

For the initial condition in the recursion, observe that with 
V(0) = E[Vb], the second order statistics are given by 

5„„(0) = E[^(0) 2 ] = (E[V }) 2 
a vl (0) = E[V V(0)] = (E[V ]) 2 . 

Hence, from (l27T i. the initial correlation coefficient is 

m «gi(0) (E[Vp]) 2 
T v a v o(0) t v 

which agrees with the statement in the theorem. This proves 
part (a). 

To prove part (b), we again use Lemma [3] Define the 
functions 

H u (Pv) ■= 1 - T~ 1 £ u (f3T v p v /T w ) (93a) 
H v (p u ) := 1-t~ 1 £ v (t u p u /t w ), (93b) 

and their concatenation 

H(p v ) = H v (H u ( Pv )). (94) 

From (f34j, it follows that p v (t + 1) = H(p v (t)). Now, 
£u(Vu) and £ v (r) v ) defined in (l33l are the mean-squared errors 
of Uq and Vq under AWGN estimation measurements with 
SNRs rj u and r\ v . Therefore, £ u {j] u ) and £ v (j] v ) must be 
monotonically decreasing in r/ u and r) v . Therefore, H u (p v ) 
and H v (p u ) in d93l > are monotonically increasing functions 
and thus so is the concatenated function H(p v ) in d94l >. Also, 
since the assumption of part (b) is that £ u (i] u ) and £ v {rj v ) 
are continuous, H(p v ) is also continuous. It follows from 
Lemma [3] that p v (t) — > p* where p* is a fixed point of ( 1341 ). 
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It remains to show p* v > 0. Observe that 

£ v (r, v ) = E[V | Y = s/rfoVo + D] 

< var(Fo) = Wo] - (Wo}) 2 = r v (l - p„(0)), 

where (a) follows from the definition of £ v (rj v ) in (I33bb and 
(b) follows from the definition of r v in (l25l l and the initial 
condition p v (0) = (E[V ]) 2 /t v . It follows from (I34bb that 

p„(i+l) = 1 - —£ v (vv(t)) > p v (0). 

Therefore, the limit point p* of p v (t) must satisfy p* > 
Pv(0) > o. 

Appendix G 
Proof of Lemma|2] 

Define the functions H u , H v and H as in d93l and (|94] i 
from the previous proof. We know that p v (t+l) = H(p v (t)). 
When E[Uo] = E[V Q ] = 0, the p v = is a fixed point of the 
update. We can determine the stability of this fixed point by 
computing the derivative of H(p v ) at p v = 0. 

Towards this end, we first use a standard result (see, for 
example, |37|) that, for any prior on Uo with bounded second 
moments, the mean-squared error in J33al > satisfies 



l 



The term r u /(l + rj u T u ) is the minimum mean-squared error 
with linear estimation of {/ from an AWGN noise-corrupted 
measurement Y = ^/rj^Uo + D, D ~ N(0, 1). Equation (l95T l 
arises from the fact that linear estimation is optimal in low 
SNRs - see 11371 for details. Using d95l l, we can compute the 
derivative of H u , 



H' u (0) = -— -^-£ u {Pt vPv /t u 

T u Op v 



P„=0 



fa 



Similarly one can show that 



4(Q) 



/3t u t v 



and hence 



h'(o) = K(o)K(o) = 



(96) 



(97) 



(98) 



We now apply a standard linearization analysis of the 
nonlinear system p v (t + l) = H(p v (t)) around the fixed 
point p v = 0. See, for example, ll40l . If ^/73t u t v < t w 
then H'(0) < 1 and the fixed point is stable. Thus, for any 
p v (0) sufficiently small p v (t) — > 0. This proves part (b) of the 
lemma. 

On the other hand, if \[^t u t v > t w then -ff'(O) > 1 and the 
fixed point is unstable. This will imply that for any p v (0) > 0, 
p v (t) will diverge from zero. But, we know from Theorem [3] 
that p v (t) must converge to some fixed point of p v = H(p v ). 
So, the limit point must be positive. This proves part (a) of 
the lemma. 
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